Effects of Candidatus Liberibacter asiaticus infection on metagenome of Diaphorina citri gut endosymbiont

Asian citrus psyllid (Diaphorina citri, D. citri) is the important vector of “Candidatus Liberibacter asiaticus” (CLas), associated with Huanglongbing, the most devastating citrus disease worldwide. CLas can affect endosymbiont abundance of D. citri. Here, we generated the high-quality gut endosymbiont metagenomes of Diaphorina citri on the condition of CLas infected and uninfected. The dataset comprised 6616.74 M and 6586.04 M raw reads, on overage, from CLas uninfected and infected psyllid strains, respectively. Taxonomic analysis revealed that a total of 1046 species were annotated with 10 Archaea, 733 Bacteria, 234 Eukaryota, and 69 Viruses. 80 unique genera in CLas infected D. citri were identified. DIAMOND software was used for complement function research against various functional databases, including Nr, KEGG, eggNOG, and CAZy, which annotated 84543 protein-coding genes. These datasets provided an avenue for further study of the interaction mechanism between CLas and D. citri.

. Raw data among 6175.81-7204.40 M were produced via metagenomic sequencing, in total. After filtration of the contaminated reads, clean data of six samples were between 6168.97 M and 7193.66 M. Comparatively, approximate 17.41-18.85 M data of microbiota from ACP were obtained using 16 s amplicon sequencing, suggesting that our data is more complete than the previous results 10 . Based on the metagenomes assembling, the number of scaffolds was separately distributed into 185390-196132 with 36.81-37.49% GC content in CLas-free and CLas-infected. The Q20 values and Q30 values were all above 96% and 90% respectively, and the N50 values were all above 1000 bp ( Table 2). Table 3 generalized the overview of the composition of gut endosymbiont communities between CLas-free and CLas-infected and Table 4 summarized the gene number with function annotated after alignment in different functional databases. Figure 1 summarized the process of experiment.The endosymbiont orders with biomarkers were differed between the two groups, where Proteobacteria and Wolbachia pipientis were the most abundant in taxonomic phylum and species, respectively (Figs. 3, 4). KEGG orthology groups (KOs) relating to the category of metabolism were the most abundant in the intestinal   www.nature.com/scientificdata www.nature.com/scientificdata/ microbiota of CLas-free and CLas-infected D. citri, such as biosynthesis and metabolism of amino acid, cofactors, vitamins, and glycan (Fig. 5). Five categories including the metabolism of other amino acids, cofactors, and vitamins were enriched in CLas-free D. citri samples and thirteen categories including immune and nervous systems were enriched in CLas-infected D. citri samples, indicating that two different gut endosymbionts of D. citri had a distinct contribution for host function compensation (Fig. 6). These KOs were unevenly found in the metagenomic of the taxa. For instance, KOs belonging to the category of metabolism were enriched in the bacteria of Firmicutes and Proteobacteria and the Eukaryota of Ascomycotab (Fig. 7).
In summary, these metagenomic data offer compelling evidence on the composition and function of intestinal endosymbionts from D. citri, which not only give further comprehension for gut-associated endosymbionts and host insect coevolution under distinguished ecological niches but also open a new avenue to pest management.

Methods
Psyllid sampling, tissue collection and DNA isolation. Adult D. citri newly emerged (five days old) was initially harvested from Citrus Tachibana uninfected CLas in 2020 at Guangxi province, China, and was successively reared on Murraya paniculata at Guangxi Special Crops Research Institute for more than 15 generations. All cages and experimental treatment were maintained on the conditions of 27 ± 1 °C, 60-70% RH, and at 14:10 h of photoperiod. It was confirmed by PCR detection that D. citri did not carry CLas.
A field population of D. citri newly emerged (five days old) infected CLas was collected from Guilin City, China, in 2022. Ninety adult psyllids from Citrus Tachibana infected CLas were randomly collected for calculating the infection rate. DNA extraction of one adult psyllid followed by Yu et al. 12 . with slight modifications. Briefly, genome DNA was extracted using the QIAGEN DNeasy Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. Subsequently, HLB pathogen-specific primer set OI1/OI2c (forward primer 50-GCGCGTATGCAATACGAGCGGCA-30 and reverse primer 50-GCCTCGCGACTTCGCAACCCAT-30) were performed to amplify the target fragment. Afterward, the CLas infection rate was evaluated by agarose   www.nature.com/scientificdata www.nature.com/scientificdata/ gel electrophoresis. When the CLas-infected proportion reached over 80%, D. citri was considered to be the CLas-infected population 12 .
To eliminate the host plant microbial endosymbionts contamination, a 6 h starvation treatment was necessary. For the metagenome sequencing, one hundred and fifty newly emerged (five days old) CLas-free and CLas-infected D. citri respectively were surface sterilized with 70% ethanol for 60 s and rinsed three times with sterile water 13 , with three biological replicates. Then all samples were dissected to obtain the intestinal tissue, which was homogenized with 200 ml sterile water, and frozen at -80 °C before DNA extraction. The QIAGEN DNeasy Kit was used for microbial genomic DNA extraction. The DNA quality checking and concentrations quantification were performed by Agilent Bioanalyzer 2100 system. The high-quality DNA was sent to Novogene Company (Beijing, China) for metagenomics sequence analyses.
Metagenome sequence. NEBNext ® Ultra TM DNA Library Prep Kit for Illumina (NEB, USA) was used to construct the library. The genomic DNA was randomly sheared into short fragments which were end-repaired, A-tailed, and further ligated with an Illumina adapter. Then fragments with adapters were PCR amplified, size selected, and purified to complete library establishment. The library quality was checked with Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Massachusetts, USA). Library quantification and size distribution detected by real-time PCR and bioanalyzer. Quantified libraries were pooled and sequenced on Illumina PE150 platforms, according to effective library concentration and data amount required.
Readfq (V8, https://github.com/cjfields/readfq) was used for preprocessing raw data from the Illumina sequencing platform to obtain clean data for subsequent analysis 14 . Then MEGAHIT software (v1.0.4-beta) was used for assembly analysis of clean data, and scaftigs without N were obtained by breaking the resulted  www.nature.com/scientificdata www.nature.com/scientificdata/ scaffolds from the N junction 15,16 . MetaGeneMark (V3.05, http://topaz.gatech.edu/GeneMark/) was used to perform ORF prediction for Scaftigs (> = 500 bp) of each sample 14,17,18 . For the ORF prediction results, CD-HIT software (V4.5.8, http://www.bioinformatics.org/cd-hit/) was used to eliminate redundancy and obtain the non-redundant initial gene catalogue 19 . Clean data of each sample were aligned with the initial gene catalog using Bowtie2 (Bowtie2.2.4, http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) to calculate the number of reads of the genes on each sample alignment. The abundance of each gene in each sample was calculated based on the number of reads aligned and the length of gene 20 .